Household behavior and vulnerability to acute malnutrition in Kenya

Anticipating those most at-risk of being acutely malnourished significantly shapes decisions that pertain to resource allocation and intervention in times of food crises. Yet, the assumption that household behavior in times of crisis is homogeneous—that households share the same capacity to adapt to external shocks—ostensibly prevails. This assumption fails to explain why, in a given geographical context, some households remain more vulnerable to acute malnutrition relative to others, and why a given risk factor may have a differential effect across households? In an effort to explore how variation in household behavior influences vulnerability to malnutrition, we use a unique household dataset that spans 23 Kenyan counties from 2016 to 2020 to seed, calibrate, and validate an evidence-driven computational model. We use the model to conduct a series of counterfactual experiments on the relationship between household adaptive capacity and vulnerability to acute malnutrition. Our findings suggest that households are differently impacted by given risk factors, with the most vulnerable households typically being the least adaptive. These findings further underscore the salience of household adaptive capacity, in particular, that adaption is less effective for economic vis-à-vis climate shocks. By making explicit the link between patterns of household behavior and vulnerability in the short- to medium-term, we underscore the need for famine early warning to better account for variation in household-level behavior.


A.1 Household nutrition surveys
In Kenya, child anthropometric information is most readily available from SMART Surveys and the annual Kenya Demographic Health Surveys (KDHS), specifically for the years 2003, 2008/9, 2014, and 2021. Whereas the KDHS lacks temporal granularity, SMART surveys are conducted more frequently, albeit with geographical coverage limited to a single county. In a meta-analysis, Brown et al. (2020) identified only five studies in which anthropometric data was taken repeatedly from the same child. In this regard, the household nutrition surveys from the Kenyan National Drought Management Authority (NDMA) are exceptional for their longitudinal coverage. Each month, NDMA samples a small subset of all wards (ADM3), intended to constitute a representative sample for a county (ADM1), and then, in turn, samples households to represent each ward. In combination with open source data on stressors that are also observed in monthly intervals, our study highlights the value of longitudinal nutrition information. Table S1 illustrates the number of observations, i.e. households and MUAC of individual children, per year and study site in West Pokot. We calculate child-level statistics rather than household-level statistics. In practice, the two are closely equivalent though as typically only one and not all children in a given household are surveyed at the same time. Nutrition prevalence rates are a fraction of children below the normative wasting threshold in a given sample. While we do not expect missing data for MUAC to skew our results -it only affects 0.6% of observations, spread proportionally across spatial and temporal units (standard deviation of % missing: σ ADM 3 = 1%, σ Y ears = 0.3%) -missingness on other household covariates goes up to 88% (illness). We impute these values by aggregating household covariates to the ward level, using appropriate rules depending on the scale of measurement. In addition, given the sampling intervals, we calculate any malnutrition statistics only for intervals of three months or longer. This ensures that our measures are certain to cover a full sampling interval and, thus, are not driven by (potentially) biased monthly sub-samples. Finally, we calculate the statistical mean of nutrition prevalence to aggregate from the childto the ward level. Continuous prevalence rates are reported in the five-point IPC/AMN scale. We note potential limitations regarding the MUAC measurement as there is significant regional variation. Some argue the MUAC cannot be meaningfully collapsed into IPC categories at all, others do not distinguish between lower IPC categories. In this study, we use similar cut-off points for the MUAC as we would for the WHZ.

A.2 Economic stressor data
To provide an indication of economic stressors in West Pokot, we use data on key food commodities from NDMA's 'County Early Warning Bulletins'. Complementary to nutrition surveys, the NDMA reports the average monthly retail prices of key staple commodities, such as maize, wheat, rice or milk. We analyze changes the cost of maize specifically, as information is most readily available for the largest number of wards over time. We first calculate monthly deviation rates from a three-year average and categorize each month as stressed or not. Second, we define the intensity of market stressors as the mean price increase above the LTA, and z-standardize such that values closer to 0 represent the lowest and values closer to 1 the highest prices in a given year. Figure S1 illustrates the market stressor intensity in West Pokot between 2016 and 2020.

A.3 Climate stressor data
To model the impact of exogenous climate stressors, we use information on the seasonal normalized difference vegetation index (NDVI) from the MODIS Vegetation Index (MOD13A3) (Didan, Munoz, Solano, & Huete, 2015). The NDVI describes 'land fertility' for a region, with the greenest parish being the most food abundant. Therefore, the higher the value of vegetation and soil fertility, the more productive local agricultural and pastoral production. Using this information, we compare (i) each month's statistical value for each area II with the average value for all years in that month, and (ii) each month's statistical value for each area with the average value for the previous five years in that month to compute anomalies. The approach enables us (i) to calculate the intensity of climate stressors and assess their influence on resilience to acute malnutrition in the statistical model, and (ii) to explicitly account for seasonal variations in the counterfactual experiment. To assess the sensitivity of our conceptualization of seasonal influences in the counterfactual analysis, we compare the observed NDVI in four wards in West Pokot between 2000 and 2019 to two alternative indicators: temperature and precipitation. As illustrated in Figure S2, we find only small differences among the three indicators and therefore use the NDVI.

B Methods
B.1 Computational model development Figure S3 illustrates the six-step development process of our evidence-driven computational model as specified in (Bhavnani, Donnay, & Reul, 2020).  Extensive pre-testing, theoretical and empirical validation result in the design of a model "prototype" that serves as a best approximation for West Pokot and may be generalizable to similar cases, further tailored by weighting specific causal drivers, behavioral parameters, and mechanisms. See the next section, B.2, for an alternative model specification, tailored predominantly pastoralist contexts, for instance Turkana county.

B.2 Alternative model specification
To capture the influence of pastoralism as a predominant livelihood strategy in Turkana county, the model is fine-tuned by means of two specific adjustments.
• First, we update the theoretical assumptions for the strategy set (S) in the model, tailoring it better to pastoralist households. We assume that pastoralist households use distinct strategies to cope with acute malnutrition relative to households that rely on agro-pastoralism as their main livelihood. While agro-pastoralist households rely mainly on the diversification of production and income, for example planting other sorts of crops or bee harvesting, pastoralist households are mainly reliant on livestock, which also increases the likelihood for migration and overall mobility. One could assume that pastoralist households have a larger set of coping strategies available. Yet, local experts confirmed that such strategies are increasingly hard to realize for pastoralists due to their reliance of livestock herding. The semi-arid lands in Turkana do not enable pastoralist households to engage in agricultural or fishing activities, effectively reducing the range of coping strategies available to them. We therefore assume that pastoralists have a smaller set of strategies to cope with acute malnutrition relative to households with other livelihoods. In the model, the set of available strategies can thus be reduced by a factor ranging from 0 to 1 -where 1 represents a 100% reduction in the household's ability to cope with external shocks. • Second, we fine-tune the parameter for household assets (h a ) in the model.
As that pastoralist households are not sedentary, given their search for fertile herding grounds, they accumulate fewer assets relative to more settled households. We seed this variable with cross-sectional data on average wealth or food-related household assets and simplify it to be binary, such that a household is either poor (in absolute terms) or not. h a then weights the household food source variable (h s ). In this manner, we avoid further complicating the model dynamics, accounting for heterogeneity in the ability of pastoralist households to acquire food, weighted by assets.
By means of fine-tuning the model as outlined above, we are able to capture the increased vulnerability of pastoralists relative to households with other livelihood strategies, keeping the main mechanisms of the model unchanged.

B.3 Qualitative model validation procedures
As part of our efforts to refine the model, we conducted fieldwork in West Pokot county in July 2019. This 'ground-truthing' exercise was essential to ensure that relevant household behavioral dynamics are accurately reflected in our model. With the help of ACF Kenya, we were able to visit four sub-counties in West Pokot: North Pokot (Kacheliba), South Poktot (Senetwo and Chepareria), West Pokot (Kishaunet and Kapenguria), and Central Pokot (Sigor). ACF Kenya selected and mobilized experts and communities based on priorities that we agreed on in advance. Key selection criteria included heterogeneity in expertise and livelihood strategies, respectively. We administered three types of qualitative data collection instruments during our visit: • Focus group discussions. We conducted three FDGs in West Pokot. One with agro-pastoralists in Chepareria, the second with a mother-to-mothersupport group in Senetwo, and the third with a group of pastoralists in Kacheliba. The average group size was approximately around 20 individuals. Due to the high number of participants and their active contributions, we had to adjust our initial set of questions to not go over time and overburden the respondents. Thus, during the 60 minute discussion, we focused mostly on household's understandings of malnutrition causes and processes, the V main influence factors, e.g. seasonality, as well as possible coping strategies. The discussions gave us diverse community viewpoints on household understandings of malnutrition causes and processes, the impact of environmental factors, as well as possible coping strategies.
• Individual interviews. Parallel to the FGDs, we conducted an individual interview with an elected chairperson of each group. We asked the community to agree on one person which they wanted to nominate for the individual interview based on their level of experience and status within a community. With these in-depth interviews, we aimed to achieve a deeper understanding of the individual assessments of malnutrition dynamics, as well as the implications of shocks and stressors for specific individuals that are influential within a community and thus likely to be able to influence individual behavior. Our interviewees included the chairwomen of the agro-pastoralists in Chepareria, the mother-to-mother-support group in Senetwo, and the group of pastoralists in Kacheliba. The questionnaire consisted of approximately 50 questions structured according to five categories, ranging from hunger, migration and seasonality, to local infrastructure and WASH. Similarly to the FGD, the administration took approximately 60 minutes.
• Expert interviews: standardized questionnaires. We conducted 11 expert interviews and subsequently administered our expert questionnaire in paper form. Experts could be government officials, nutrition experts, regional experts or NGO staff. Our sample inter alia includes employees of UNICEF, NDMA and government officials from the Departments of Livestock and Agriculture and Education. During the interview we mostly focused on a modeling exercise to visualize the mental models of the experts, including factors at the household level that could affect nutritional outcomes, the causes and consequences of malnutrition, and how these factors relate to each other. Experts were provided with flashcards to visualize their ideas. The standardized questionnaire could be filled in and handed over to ACF Kenya, who assisted us with the collection.

B.4 Quantitative model validation procedures
Model estimations -in terms of effect size and significance -are based on a trade-off between Type I and Type II errors, which translate into falsely rejecting and falsely failing to reject the null hypothesis that our model does not explain the empirical malnutrition prevalence rates. We discuss this trade-off in the context of internal validity -the correspondence between optimized model simulations and empirically observed data -and external validity -the correspondence between optimized model simulations and empirically observed data for which the model was not optimized. The quantitative assessment of model fit, both for calibration and validation, includes systematic estimations of bias -such as the root mean-squared deviation. In addition, metrics for predictive accuracy -such as the F1-score and Hamming loss-functionallow us to assess the degree to which empirical variation can be explained by VI the model. We use these measures of in-sample validity and predictive power as joint indicators of our model's ability to provide valid forecasts.

B.4.1 In-sample validity
We assess the in-sample validity of our model by optimizing the fully-specified and refined model, and comparing simulation results against a set of target statistics from empirical data. We use the root mean-squared deviation (RMSD) to measure the degree of deviation between simulated and empirical MUAC statistics per administrative unit for a given observation period: where x is the simulated average MUAC, y the empirically observed MUAC, normalized by p the population density estimate, for administrative unit i. Complementing the measurement of error, we measure the in-sample accuracy of the model. Specifically, we consider the degree to which the model correctly predicts MUAC categories as a measure of goodness of fit (GOF). To this end, we transform empirical and simulated prevalence outcomes into multiple categories and use both a generalized F1-score and normalized Hamming loss-function to calculate the fraction of categories that are correctly and incorrectly predicted respectively. The scores range between 0 and 1, with 1 for the F1-score and 0 for the Hamming loss-function indicating perfect accuracy. A potential disadvantage of the GOF approach is that we relate "success" to abstract categories that may have less meaning on the ground than a MUAC score, whereas an advantage of the approach is that it provides stricter criteria for success than correlation-based measures, since any false positive or false negative prediction is counted as failure. Model optimization then proceeds as follows. Through model enumeration, we identify combinations of model parameters that yield minimal RMSD and maximal GOF of simulated vs. empirical MUAC statistics. The model's parameter space p = {p 1 , p 2 , . . . , p i } and the range of value for each parameter are informed both by insights from expert interviews and fieldwork, as well as data availability for each case. As a general rule, model optimization is used to obtain estimates for those parameters that are (i) of central relevance for the mechanism in question, and (ii) are otherwise empirically unobservable. We optimize the following parameters: spawn rate p 1v = [0.01 . . . 0.1] for new strategy generation, the learning rate (λ) p 2v = [0.01 . . . 0.1] which dictates the probability that households switch between strategies, and the mode of adaptation p 3v = {"random", "hillclimbing", "reinforcement"} which determines how households select between strategies. These parameters capture the strategic decision-making of households, a key dynamic that is not empirically observable. In addition, a set of global parameters determine how interactions are operationalized in the model: the size of the Moore neighborhood p 4v = [1, 2, 3] for a given household, and the duration of a single model simulation in model steps p 5v = [100, 500, 1000]. We assess the sensitivity of model outcomes to these choices by means of robustness checks. Second, given a range of possible parameter values for each p, the optimized model is that for which combinations of all possible parameter values p 1v p 2v . . . p iv yields the lowest RMSD and highest GOF.
Note that there is no consensus for evidence-driven model simulations on a cut-off value at which the degree of agreement between the optimized model and empirical outcomes is considered substantial or satisfactorily high to consider it internally valid. This is largely due to the relative novelty of the method in social science applications.

B.4.2 Out-of-sample validity
Out-of-sample validity refers to the correspondence between optimized model simulations and empirically observed data for which the model was not optimized. We assess the external validity of the model in two ways. First, we analyze the predictive power of the model by splitting the available data input I into two non-overlapping time periods: one which we use to optimize the model (I A , "training data" to minimize RMSD and maximize in-sample GOF, "in-sample validation"), and one to assess the fit of the optimized model (I B , "out-of-sample data"). And second, we assess the extent to which the model is generalizable -in terms of temporal transformability and varying data availability.
Monthly datapoints -as collected by NDMA, lend themselves to a temporal split. We divide the sample longitudinally to compare model results for the year 2017 to those for 2018. Note that we consider a full year, i.e., a full seasonal cycle. Given that we analyze a representative sample of wards over time, factors such as the overall dispersion of MUAC or the household behavior that produces it are closely comparable. The external distribution of shocks or seasonal patterns may vary between years and thus influence out-of-sample predictive power. We compare internal validity as based on the split sample to internal validity as based on the full, "baseline" sample discussed below. The difference in performance of the baseline model and the predictive model with empirical input data I and I B , respectively, serves as an indicator for the model's sensitivity to different data inputs. Further, it tells us something about the trade-off between an internally valid model that explains a particular empirical pattern well, and an externally valid model that explains variation beyond the particular case it was optimized for.

B.4.3 True out-of-sample validity
In addition to in-sample and standard split sample validation, we assess the model's true out-of-sample predictive power. In other words: How well can the model forecast in the future. Forecasts into the future are "true" forecasts that are made for time periods beyond the end of the available data. Model performance as reported on in-sample and split sample validity are likely overly-optimistic. This is because data in the estimation period are used to help select the model and to estimate its parameters, i.e. the model uses data on both sides of each observation to determine the forecast. A true outof-sample analysis follows a different logic. We estimate the model based on data for some month t and then construct a forecast of acute malnutrition prevalence rates for months t + 2 and t + 4. After 2 or 4 months we record the forecast error, re-estimate the model, and make a new forecasts for each time horizon, and so forth. The result is a set of true out of sample forecast errors that provide a robust assessment of model performance. Note that the specification and implementation of such models is demanding, given that they must be re-optimized for each new prediction interval. Forecasting the future in real time -so called "leading edge" predictions, require all the available data for estimation, so that the most recent data is used. In short, it is difficult to properly validate our models if data is in short supply or non-existent. NDMA collects large quantities of nutrition data on a monthly basis. The quality of data -its accuracy, resolution and coverage -is far from satisfying the need for near real-time surveillance required by stakeholders and decision-makers. Yet, Kenya is a comparably data-rich case study, allowing us to explore the predictive power of the model on data it was not optimized for. Using NDMA data for the year 2019, we test three separate leading-edge prediction periods. In each 2019 period -(i ) Jan. to Apr., (ii ) May to Aug, and (iii ) Sep. to Dec.-we test both 2-and 4-month prediction horizons.

B.4.4 Joint validity
The validity of a model is given by the joint ability to simulate empirically observed outcomes in-sample and provide out-of-sample predictive power. Specifically, a model "successfully" simulates empirical outcomes if both insample validity and out-of-sample predictive power place the model in the upper right quadrant, as highlighted in Figure 5 in the main text.
We previously noted that the absence of an established standard to assess the empirical validity of computational models in the social sciences, and have advised against treating any model result as valid at face value based on any single criterion. Also as mentioned,the random prediction baseline for multiple categories is much lower than the standard baseline of 0.5 for binary prediction tasks, given that predicting a larger set of categories correctly, i.e., the five IPC categories, is systematically harder than just classifying, for example, whether a famine condition exists or not. Accounting for the fact that observations are unevenly distributed across different prevalence categories in our empirical setting, we found the random prediction baseline in these cases to lie between GOF values of 0.3 and 0.4 (GOF here as measured by F1-score). We therefore set our joint validity criteria such that a model with GOF > 0.5 for internal validity, and GOF > 0.5 for predictive power are considered successful in this more complex prediction setting.

C.3.2 Projections
In order to arrive at the 4-months leading-edge predictions for the year 2020, we estimate the model based on data for the year prior and then project risk of acute malnutrition prevalence rates for April through July 2020. The corresponding Figure 7 in the main manuscript shows the empirical prevalence rates for 2019 up until March 2020 and three separate projections for the three scenarios described above (Table S4), where we provide the low and high λ as well as the baseline estimate for each. In Figure S8 we show the same projections but including two other wards covered by the NDMA data.  Figure S8: West Pokot: Counterfactual Scenarios for Alternative Wards (2020).